BoostML: An Adaptive Metric Learning for Nearest Neighbor Classification

نویسندگان

  • Nayyar Abbas Zaidi
  • David McG. Squire
  • David Suter
چکیده

The nearest neighbor classification/regression technique, besides its simplicity, is one of the most widely applied and well studied techniques for pattern recognition in machine learning. A nearest neighbor classifier assumes class conditional probabilities to be locally smooth. This assumption is often invalid in high dimensions and significant bias can be introduced when using the nearest neighbor rule. This effect can be mitigated to some extent by using a locally adaptive metric. In this work we present a detailed analysis of the introduction of bias in high dimensional machine learning data and propose an adaptive metric learning algorithm that learns an optimal metric at the query point using respective class distributions on the input measurement space. We learn a distance metric using a feature relevance measure inspired by boosting. The modified metric results in a smooth neighborhood that leads to better classification results. We tested our technique on major UCI machine learning databases and compared the results to state of the art techniques. Our method resulted in significant improvements in the performance of the K-NN classifier and also performed better than other techniques on major databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Nearest Neighbor Classifier Based on Supervised Ellipsoid Clustering

Nearest neighbor classifier is a widely-used effective method for multi-class problems. However, it suffers from the problem of the curse of dimensionality in high dimensional space. To solve this problem, many adaptive nearest neighbor classifiers were proposed. In this paper, a locally adaptive nearest neighbor classification method based on supervised learning style which works well for the ...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Adaptive Metric nearest Neighbor Classification

Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chisqu...

متن کامل

An Adaptive Metric Machine for Pattern Classification

Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-sq...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010